lp-Recovery of the Most Significant Subspace among Multiple Subspaces with Outliers

نویسندگان

  • Gilad Lerman
  • Teng Zhang
چکیده

Abstract: We assume data sampled from a mixture of d-dimensional linear subspaces with spherically symmetric outliers. We study the recovery of the global l0 subspace (i.e., with largest number of points) by minimizing the lp-averaged distances of data points from d-dimensional subspaces of R , where 0 < p ∈ R. Unlike other lp minimization problems, this minimization is non-convex for all p > 0 and thus requires different methods for its analysis. We show that if 0 < p ≤ 1, then the global l0 subspace can be recovered by lp minimization with overwhelming probability (which depends on the generating distribution and its parameters). Moreover, when adding homoscedastic noise around the underlying subspaces, then with overwhelming probability the generalized l0 subspace (with largest number of points “around it”) can be nearly recovered by lp minimization with an error proportional to the noise level. On the other hand, if p > 1 and there is more than one underlying subspace, then with overwhelming probability the global l0 subspace cannot be recovered and the generalized one cannot even be nearly recovered.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exact Subspace Segmentation and Outlier Detection by Low-Rank Representation

In this work, we address the following matrix recovery problem: suppose we are given a set of data points containing two parts, one part consists of samples drawn from a union of multiple subspaces and the other part consists of outliers. We do not know which data points are outliers, or how many outliers there are. The rank and number of the subspaces are unknown either. Can we detect the outl...

متن کامل

A Novel Subspace Outlier Detection Approach in High Dimensional Data Sets

Many real applications are required to detect outliers in high dimensional data sets. The major difficulty of mining outliers lies on the fact that outliers are often embedded in subspaces. No efficient methods are available in general for subspace-based outlier detection. Most existing subspacebased outlier detection methods identify outliers by searching for abnormal sparse density units in s...

متن کامل

Robust Subspace Outlier Detection in High Dimensional Space

Rare data in a large-scale database are called outliers that reveal significant information in the real world. The subspace-based outlier detection is regarded as a feasible approach in very high dimensional space. However, the outliers found in subspaces are only part of the true outliers in high dimensional space, indeed. The outliers hidden in normalclustered points are sometimes neglected i...

متن کامل

A Geometric Analysis of Subspace Clustering with Outliers

This paper considers the problem of clustering a collection of unlabeled data points assumed to lie near a union of lower dimensional planes. As is common in computer vision or unsupervised learning applications, we do not know in advance how many subspaces there are nor do we have any information about their dimensions. We develop a novel geometric analysis of an algorithm named sparse subspac...

متن کامل

Isotropic Constant Dimension Subspace Codes

 In network code setting, a constant dimension code is a set of k-dimensional subspaces of F nq . If F_q n is a nondegenerated symlectic vector space with bilinear form f, an isotropic subspace U of F n q is a subspace that for all x, y ∈ U, f(x, y) = 0. We introduce isotropic subspace codes simply as a set of isotropic subspaces and show how the isotropic property use in decoding process, then...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1012.4116  شماره 

صفحات  -

تاریخ انتشار 2010